251 research outputs found
Low-cost Interference Mitigation and Relay Processing for Cooperative DS-CDMA Systems
In wireless communications, propagation aspects such as fading, shadowing and path loss are the major constraints that seriously limit the overall performance of systems. Indeed, severe fading has a detrimental effect on the received signals and can lead to a degradation of the transmission of information and the reliability of the network. In this case, diversity techniques are introduced in order to mitigate fading. Among various kinds of diversity techniques, cooperative diversity with relaying nodes is a modern technique that has been widely considered in recent years as an effective tool to deal with this problem. Several cooperative protocols have been proposed in the literature, and among the most effective ones are Amplify-and-Forward (AF) and Decode-and-Forward (DF).
Cooperative diversity can be combined with direct sequence code division multiple access (DS-CDMA) systems to further enhance the information security. However, due to the multiple access interference (MAI) that arises from nonorthogonal received waveforms in the DS-CDMA systems, the system performance may easily be affected. To deal with this issue, novel multiuser detection (MUD) technique is introduced as a useful relay processing strategy for the uplink of cooperative DS-CDMA systems. Apart from that, distributed space-time coding (DSTC) is another effective approach that can be combined with cooperative diversity to further improve the transmission performance. Moreover, in order to increase the throughput of the cooperative DS-CDMA network, physical-layer network coding (PNC) scheme is then adopted together with the cooperative DS-CDMA network. Clearly, better performance gain and lower power consumption can be obtained when appropriate relaying strategies are applied
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Transformers have achieved great success in machine learning applications.
Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root
Mean Square Normalization (RMSNorm), play a critical role in accelerating and
stabilizing the training of Transformers. While LayerNorm recenters and
rescales input vectors, RMSNorm only rescales the vectors by their RMS value.
Despite being more computationally efficient, RMSNorm may compromise the
representation ability of Transformers. There is currently no consensus
regarding the preferred normalization technique, as some models employ
LayerNorm while others utilize RMSNorm, especially in recent large language
models. It is challenging to convert Transformers with one normalization to the
other type. While there is an ongoing disagreement between the two
normalization types, we propose a solution to unify two mainstream Transformer
architectures, Pre-LN and Pre-RMSNorm Transformers. By removing the inherent
redundant mean information in the main branch of Pre-LN Transformers, we can
reduce LayerNorm to RMSNorm, achieving higher efficiency. We further propose
the Compressed RMSNorm (CRMSNorm) and Pre-CRMSNorm Transformer based on a
lossless compression of the zero-mean vectors. We formally establish the
equivalence of Pre-LN, Pre-RMSNorm, and Pre-CRMSNorm Transformer variants in
both training and inference. It implies that Pre-LN Transformers can be
substituted with Pre-(C)RMSNorm counterparts at almost no cost, offering the
same arithmetic functionality along with free efficiency improvement.
Experiments demonstrate that we can reduce the training and inference time of
Pre-LN Transformers by up to 10%.Comment: 15 pages, 5 tables, code available at
https://github.com/ZixuanJiang/pre-rmsnorm-transforme
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
In this work, we propose FastCoT, a model-agnostic framework based on
parallel decoding without any further training of an auxiliary model or
modification to the LLM itself. FastCoT uses a size-varying context window
whose size changes with position to conduct parallel decoding and
auto-regressive decoding simultaneously, thus fully utilizing GPU computation
resources. In FastCoT, the parallel decoding part provides the LLM with a quick
glance of the future composed of approximate tokens, which could lead to faster
answers compared to regular autoregressive decoding used by causal
transformers. We also provide an implementation of parallel decoding within
LLM, which supports KV-cache generation and batch processing. Through extensive
experiments, we demonstrate that FastCoT saves inference time by nearly 20%
with only a negligible performance drop compared to the regular approach.
Additionally, we show that the context window size exhibits considerable
robustness for different tasks
Robust inference with GhostKnockoffs in genome-wide association studies
Genome-wide association studies (GWASs) have been extensively adopted to
depict the underlying genetic architecture of complex diseases. Motivated by
GWASs' limitations in identifying small effect loci to understand complex
traits' polygenicity and fine-mapping putative causal variants from proxy ones,
we propose a knockoff-based method which only requires summary statistics from
GWASs and demonstrate its validity in the presence of relatedness. We show that
GhostKnockoffs inference is robust to its input Z-scores as long as they are
from valid marginal association tests and their correlations are consistent
with the correlations among the corresponding genetic variants. The property
generalizes GhostKnockoffs to other GWASs settings, such as the meta-analysis
of multiple overlapping studies and studies based on association test
statistics deviated from score tests. We demonstrate GhostKnockoffs'
performance using empirical simulation and a meta-analysis of nine European
ancestral genome-wide association studies and whole exome/genome sequencing
studies. Both results demonstrate that GhostKnockoffs identify more putative
causal variants with weak genotype-phenotype associations that are missed by
conventional GWASs
Second-order group knockoffs with applications to GWAS
Conditional testing via the knockoff framework allows one to identify --
among large number of possible explanatory variables -- those that carry unique
information about an outcome of interest, and also provides a false discovery
rate guarantee on the selection. This approach is particularly well suited to
the analysis of genome wide association studies (GWAS), which have the goal of
identifying genetic variants which influence traits of medical relevance.
While conditional testing can be both more powerful and precise than
traditional GWAS analysis methods, its vanilla implementation encounters a
difficulty common to all multivariate analysis methods: it is challenging to
distinguish among multiple, highly correlated regressors. This impasse can be
overcome by shifting the object of inference from single variables to groups of
correlated variables. To achieve this, it is necessary to construct "group
knockoffs." While successful examples are already documented in the literature,
this paper substantially expands the set of algorithms and software for group
knockoffs. We focus in particular on second-order knockoffs, for which we
describe correlation matrix approximations that are appropriate for GWAS data
and that result in considerable computational savings. We illustrate the
effectiveness of the proposed methods with simulations and with the analysis of
albuminuria data from the UK Biobank.
The described algorithms are implemented in an open-source Julia package
Knockoffs.jl, for which both R and Python wrappers are available.Comment: 46 pages, 10 figures, 2 tables, 3 algorithm
A compact butterfly-style silicon photonic-electronic neural chip for hardware-efficient deep learning
The optical neural network (ONN) is a promising hardware platform for
next-generation neurocomputing due to its high parallelism, low latency, and
low energy consumption. Previous ONN architectures are mainly designed for
general matrix multiplication (GEMM), leading to unnecessarily large area cost
and high control complexity. Here, we move beyond classical GEMM-based ONNs and
propose an optical subspace neural network (OSNN) architecture, which trades
the universality of weight representation for lower optical component usage,
area cost, and energy consumption. We devise a butterfly-style
photonic-electronic neural chip to implement our OSNN with up to 7x fewer
trainable optical components compared to GEMM-based ONNs. Additionally, a
hardware-aware training framework is provided to minimize the required device
programming precision, lessen the chip area, and boost the noise robustness. We
experimentally demonstrate the utility of our neural chip in practical image
recognition tasks, showing that a measured accuracy of 94.16% can be achieved
in hand-written digit recognition tasks with 3-bit weight programming
precision.Comment: 17 pages,5 figure
- …